Lab 5: Factors in Visualizations

Author

YOUR NAME

Download starter .qmd file

We will be working with the survey.csv data from Lab 2: Exploring Rodents with ggplot2

library(tidyverse)
library(lubridate)

# fix the file path for your file folder structure
surveys <- read_csv("../lab2/surveys.csv")
Error: '../lab2/surveys.csv' does not exist in current working directory ('/Users/allisontheobold/Documents/Research/IUSE/groupworthy-data-science/labs/instructions').

This lab comes from the plotting “best practices” that I’ve learned over the years. The main inspiration is Will Chase’s 2020 RStudio Conference Presentation – Glamour of Graphics.

Note

Fun fact! The inspiration for the presentation title came from Dr. Kelly Bodwin!

1 Revisiting Lab 2: Exploring Rodents with ggplot2

Let’s start with the side-by-side boxplots you created in Lab 2 to visualize the distribution of weight within each species (not species ID!).

surveys |> 
  ggplot(mapping = aes(x = weight,
                       y = species)) + 
  geom_jitter(color = "steelblue",
              alpha = 0.05) + 
  geom_boxplot(alpha = 0.2,
               outlier.shape = NA)
Error in eval(expr, envir, enclos): object 'surveys' not found

As you should expect with a character variable, the boxplots go in alphabetical order. This looks rather jumbled, so let’s put our factor skills to work!

1. Reorder the boxplots so the weights go in descending order.

Caution

You are required to use functions from forcats to complete this task.

2. Now that you’ve reordered, let’s fix our axis labels and title. Make sure your labels/title are meaningful. Also, let’s take Will Chase’s advice and move the y-axis label to the top of the plot.

2 Time-Series Plot

Warning

This is a new section, meaning you will create a new plot. In other words, this is not a continuation of the boxplot from above.

This week, we are focusing on learning skills related to dates, but we have yet to make a very common type of plot – the time-series plot. We’ll use this plot to motivate a second type of factor reordering!

3. Create a visualization of how weights vary for each genus over the duration of the study.

Tip

What variables do you need and what are their variable types? What aesthetic would it make sense to map each variable to? Sketch it out first!

Avoid using faceting here.

Alright, there are a lot of measurements over time! Let’s use our dplyr skills to summarize each year and plot the summaries.

4. Calculate and plot the mean weight for each year (for each genus).

This should look much less busy! However, you should notice that the legend still goes in alphabetical order.

5. Reorder the legend line colors so the weights go in descending order.

Caution

You are required to use functions from forcats to complete this task.

6. Now that you’ve reordered the lines, look at your labels and title. Let’s give the legend, axes, and plot new titles and move the y-axis label to the top of the plot.

3 Caputures over the Week

For our final exploration, we will consider the number of rodents captured throughout the week – transitioning to visualizations of categorical variables.

7. Sketch out a game plan for visualizing the number of rodents captured each day of the week. Include your game plan below.

8. Implement your game plan and create a visualization of the number of rodents captured each day of the week.

As you might have expected, the ordering of the days of the week is not what we would like.

9. Change the order of the day of the week to go Monday through Sunday.

Note

You can choose to keep the days named as they are (e.g., Mon, Sun), or you can choose to rename the days to their full names (e.g., Monday, Sunday).

Warning

Your plot should not make people tilt their heads to read it! Be mindful in choosing which variable goes on which axis and if / how you use axis labels.

It should be very clear that there are more rodents captured on the weekend than during the week. But, let’s explore if this is still the case if we use a “Weekday” / “Weekend” classification system instead.

10. Collapse Monday through Friday into a "Weekday" level, and collapse Saturday and Sunday into a "Weekend" level. Plot the number of rodents captured between the two groups.

Warning

Your plot should not make people tilt their heads to read it! Be mindful in choosing which variable goes on which axis and if / how you use axis labels.


Caution

Don’t forget to submit your final project group formation survey this week! Due at the same time as this lab!